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EXECUTIVE SUMMARY 


tate education rankings published by US. News 

& World Report, Education Week, and others play 

a prominent role in legislative debate and public 

discourse concerning education. These rankings 

are based partly on achievement tests, which 
measure student learning, and partly on other factors not 
directly related to student learning. When achievement 
tests are used as measures of learning in these convention- 
al rankings, they are aggregated in a way that provides mis- 
leading results. To overcome these deficiencies, we create 
a new ranking of state education systems using demo- 
graphically disaggregated achievement data and exclud- 
ing less informative factors that are not directly related 
to learning. Using our methodology changes the order of 
state rankings considerably. Many states in New England 
and the Upper Midwest fall in the rankings, whereas many 


states in the South and Southwest score much higher 
than they do in conventional rankings. Furthermore, we 
create another set of rankings on the efficiency of educa- 
tion spending. In these efficiency rankings, achieving 
successful outcomes while economizing on education 
expenditures is considered better than doing so through 
lavish spending. These efficiency rankings cause a further 
increase in the rankings of southern and western states 
and a decline in the rankings of northern states. Finally, 
our regression results indicate that unionization has a 
powerful negative influence on educational outcomes, and 
that, given current spending levels, additional spending 
has little effect. We also find no evidence of a relationship 
between student performance and teacher-pupil ratios or 
private school enrollment, but some evidence that charter 
school enrollment has a positive effect. 
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INTRODUCTION 

Which states have the best K-12 education 
systems? What set of government policies and 
education spending levels is needed to achieve 
targeted outcomes in an efficient manner? An- 
swers to these important questions are essen- 
tial to the performance of our economy and 
country. Local workforce education and quality 
of schools are key determinants in business and 
residential location decisions. Determining 
which education policies are most cost-effective 
is also crucial for state and local politicians as 
they allocate limited taxpayer resources. 

Several organizations rank state K-12 educa- 
tion systems, and these rankings play a promi- 
nent role in both legislative debate and public 
discourse concerning education. The most 
popular are arguably those of US. News & World 
Report (US. News)." It is common for activists 
and pundits (whether in favor of homeschool- 
ing, stronger teacher unions, core standards, 
etc.) to use these rankings to support their ar- 
guments for changes in policy or spending pri- 
orities. As shown by the recent competition for 
Amazon’s HQ2 (second headquarters), politi- 
cians and business leaders will also frequently 
cite education rankings to highlight their states’ 
advantages.” Recent teacher strikes across the 
country have likewise drawn renewed atten- 
tion to education policy, and journalists inevi- 
tably mention state rankings when these topics 
arise.? It is therefore important to ensure that 
such rankings accurately reflect performance. 

Though well-intentioned, most existing 
rankings of state K-12 education are unreliable 
and misleading. The most popular and influen- 
tial state education rankings fail to provide an 
“apples to apples” comparison between states.* 
By treating states as though they had identical 
students, they ignore the substantial variation 
present in student populations across states. 
Conventional rankings also include data that 
are inappropriate or irrelevant to the educa- 
tional performance of schools. Finally, these 
analyses disregard government budgetary con- 
straints. Not surprisingly, using disaggregated 
measures of student learning, removing inap- 
propriate or irrelevant variables, and examining 


the efficiency of educational spending reorders 
state rankings in fundamental ways. As we show 
in this report, employing our improved ranking 
methodology overturns the apparent consen- 
sus that schools in the South and Southwest 
perform less well than states in the Northeast 
and Upper Midwest. It also puts to rest the 
claim that more spending necessarily improves 
student performance.’ 

Many rankings, including those of US. 
News, provide average scores on tests admin- 
istered by the National Assessment of Edu- 
cation Progress (NAEP), sometimes referred 
to as “the nation’s report card.”° The NAEP 
reports provide average scores for various sub- 
jects, such as math, reading, and science, for 
students at various grade levels.’ These scores 
are supposed to measure the degree to which 
students understand these subjects. While 
US. News includes other measures of educa- 
tion quality, such as graduation rates and SAT 
and ACT college entrance exam scores, direct 
measures of the entire student population’s 
understanding of academic subject matter, 
such as those from the NAEP, are the most 
appropriate measures of success for an edu- 
cational system.’ Whereas graduation is not 
necessarily an indication of actual learning, 
and only those students wishing to pursue a 
college degree tend to take standardized tests 
like the SAT and ACT, NAEP scores provide 
standardized measures of learning covering 
the entire student population. Focusing on 
NAEP data thus avoids selection bias while 
more closely measuring a school system’s abil- 
ity to improve actual student performance. 

However, student heterogeneity is ignored 
by US. News and most other state rankings that 
use NAEP data as a component of their rank 
ings. Students from different socioeconomic 
and ethnic backgrounds tend to perform dif- 
ferently (regardless of the state they are in). As 
this report will show, such aggregation often 
renders conventional state rankings as little 
more than a proxy for a jurisdiction’s demogra- 
phy. This problem is all the more unfortunate 
because it is so easily avoided. NAEP provides 
demographic breakdowns of student scores by 


state. This oversight substantially skews the 
current rankings. 

Perhaps just as problematic, some educa- 
tion rankings conflate inputs and outputs. 
For instance, Education Week uses per pupil 
expenditures as a component in its annual 
rankings.” When direct measures of student 
achievement are used, such as NAEP scores, 
it is a mistake to include inputs, such as edu- 
cational expenditures, as a separate factor."° 
Doing so gives extra credit to states that 
spend excessively to achieve the same level of 
success others achieve with fewer resources, 
when that wasteful extra spending should in- 
stead be penalized in the rankings. 

Our main goal in this report is to provide a 
ranking of public school systems in U.S. states 
that more accurately reflects the learning that 
is taking place. We attempt to move closer to 
a “value added” approach as explained in the 
following hypothetical. Consider one school 
system where every student knows how to 
read upon entering kindergarten. Compare 
this to a second school system where students 
don't have this skill upon entering kindergar- 
ten. It should come as no surprise if, by the end 
of first grade, the first school’s students have 
better reading scores than the second school’s. 
But if the second school’s students improved 
more, relative to their initial situation, a value- 
added approach would conclude that the 
second system actually did a better job. The 
value-added approach tries to capture this by 
measuring improvement rather than absolute 
levels of education achievement. Although 
the ranking presented here does not directly 
measure value added, it captures the concept 
more closely than do previous rankings by ac- 
counting for the heterogeneity of students 
who presumably enter the school system with 
different skills. Our approach is thus a better 
way to gauge performance. 

Moreover, this report will consider the 
importance of efficiency in a world of scarce 
resources. Our final rankings will rate states 
according to how much learning similar stu- 
dents have relative to the amount of resources 
used to achieve it. 


THE IMPACT OF HETEROGENEITY 

Students arrive to class on the first day of 
school with different backgrounds, skills, and 
life experiences, often related to socioeco- 
nomic status. Assuming away these differenc- 
es, as most state rankings implicitly do, may 
lead analysts to attribute too much of the vari- 
ation in state educational outcomes to school 
systems instead of to student characteristics. 
Taking student characteristics into account is 
one of the fundamental improvements made 
by our state rankings. 

An example drawn from NAEP data il- 
lustrates how failing to account for student 
heterogeneity can lead to grossly misleading 
results. (For a more general demonstration of 
how heterogeneity affects results, see the Ap- 
pendix.) According to US. News, Iowa ranks 
8th and Texas ranks 33rd in terms of pre-K—12 
quality. U.S. News includes only NAEP eighth- 
grade math and reading scores as components 
in its ranking, and Iowa leads Texas in both. By 
further including fourth grade scores and the 
NAEP science tests, the comparison between 
Iowa and Texas remains largely unchanged. 
Iowa students still do better than Texas stu- 
dents, but now in all six tests reported for those 
states (math, reading, and science in fourth and 
eighth grades). To use a baseball metaphor, this 
looks like a shut-out in Iowa’s favor. 

But this is not an apples-to-apples compari- 
son. The characteristics of Texas students are 
very different from those of Iowa students; 
Iowa’s student population is predominantly 
white, while Texas’s is much more ethnically 
diverse. NAEP data include average test scores 
for various ethnic groups. Using the four most 
populous ethnic groups (white, black, His- 
panic, and Asian)," at two grade levels (fourth 
and eighth), and three subject-area tests (math, 
reading, science), there are 24 disaggregated 
scores that could, in principle, be compared be- 
tween the two states in 2017. This is much more 
than just the two comparisons—eighth grade 
reading and math—that US. News considers.” 

Given that Iowa students outscore their 
Texas counterparts on each of the three tests 
in both fourth and eighth grades, one might 
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reasonably expect that most of the disag- 
gregated groups of Iowa students would also 
outscore their Texas counterparts in most of 
the twenty exams given in both states. But 
the exact opposite is the case. In fact, Texas 
students outscore their Iowa counterparts in 
all but one of the disaggregated comparisons. 
The only instance where Iowa students beat 
their Texas counterparts is the reading test 
for eighth grade Hispanic students. This is 
indeed a near shut-out, but one in Texas’s fa- 
vor, not Iowa’s. 

Let that sink in. Texas whites do better 
than Iowa whites in each subject test for each 
grade level. Similarly, Texas blacks do bet- 
ter than Iowa blacks in each subject test and 
grade level. Texas Hispanics do better than 
Iowa Hispanics in all but one test in one grade 
level. Texas Asians do better than Iowa Asians 
in all tests that both states report in common. 
In what sense could we possibly conclude that 
Iowa does a better job educating its students 
than does Texas?"t We think it obvious that 
the aggregated data here are misleading. The 
only reason for Iowa’s higher overall average 
scores is that, compared to Texas, its student 
population is disproportionately composed of 
whites. Iowa’s high ranking is merely a statis- 
tical artifact of a flawed measurement system. 
When student heterogeneity is considered, 
Texas schools clearly do a better job educating 
students, at least as indicated by the perfor- 
mance of students as measured by NAEP data. 

This discrepancy in scores between these 
two states is no fluke either. In numerous in- 
stances, state education rankings change sub- 
stantially when we take student heterogeneity 
into account.» The makers of the NAEP, to 
their credit, allow comparisons to be made for 
heterogeneous subgroups of the student popu- 
lation. However, almost all the rankings fail to 
utilize these useful data to correct for this prob- 
lem. This methodological oversight skews pre- 
vious rankings in favor of homogeneously white 
states. In constructing our ranking, we will 
use these same NAEP data, but break down 
scores into the aforementioned 24 categories 
by test subject, grade, and ethnic group to more 


properly account for heterogeneity. 

Importantly, we wish to make clear that our 
use of these four racial categories does not im- 
ply that differences between groups are in any 
way fixed or would not change under different 
circumstances. Using these categories to dis- 
aggregate students has the benefit of simplic- 
ity while also largely capturing the effects of 
other important socioeconomic variables that 
differ markedly between ethnic groups (and 
also between students within these groups)."° 
Such socioeconomic factors are related to 
race in complex ways, and controlling for race 
is common in the economic literature. In ad- 
dition, by giving equal weight to each racial 
category, our procedure puts a greater empha- 
sis on how well states teach each category of 
students than do traditional rankings, paying 
somewhat greater attention to how groups 
that have historically suffered from discrimi- 
nation are faring. 


A STATE RANKING OF LEARNING 
THAT ACCOUNTS FOR 
STUDENT HETEROGENEITY 


Our methodology is to compare state 
scores for each of three subjects (math, read- 
ing, and science), four major ethnic groups 
(whites, blacks, Hispanics, and Asian/Pa- 
cific Islanders) and two grades (fourth and 
eighth),’’ for a total of 24 potential observa- 
tions in each state and the District of Colum- 
bia. We exclude factors such as graduation 
rates and pre-K enrollment that do not mea- 
sure how much students have learned. 

We give each of the 24 tests"® equal weight 
and base our ranking on the average of the test 
scores.’? This ranking is thus limited to mea- 
suring learning and does so in a way that avoids 
the aggregation fallacy. We refer to this as the 
“quality” rank. 

From left to right, Table 1 shows our rank- 
ing using disaggregated NAEP scores (“qual- 
ity ranking”), then how rankings would look 
if based solely on aggregate state NAEP test 
scores (“aggregated rank”), and finally the US. 
News rankings. 


Table 1 
State rankings using disaggregated NAEP scores 


Quality rank* State Aggregated rank U.S. News rank** 
1 Virginia 5 12 
2 Massachusetts 1 1 
3 Florida 16 40 
4 New Jersey 2 3 
5 District of Columbia 51 - 
6 Texas 35 33 
7 Maryland 24 13 
8 Georgia 32 35 
9 Wyoming 6 34 
10 Indiana 6 17 
11 North Dakota 17 28 
12 Montana 22 10 
13 North Carolina 26 23 
14 New Hampshire 3 2 
15 Colorado 14 30 
16 Nebraska 9 15 
17 Delaware 35 18 
18 Washington 10 26 
19 Ohio 14 36 

20 Connecticut 11 5 
21 Arizona 38 48 
22 South Dakota 19 22 
23 Kentucky 29 24 
24 Illinois 28 14 
25 Kansas 22 27 
26 Pennsylvania 12 11 
27 Missouri 26 19 


28 Vermont 8 4 


Quality rank* State Aggregated rank U.S. News rank** 
South Carolina 
Tennessee 
New York 
lowa 
Minnesota 
Mississippi 
California 
Michigan 
Hawaii 
Idaho 
Utah 
Rhode Island 
Oklahoma 
New Mexico 
Alaska 
Nevada 
Oregon 
Wisconsin 
Louisiana 
Arkansas 


Maine 


West Virginia 


Alabama 


*Controls for heterogeneity; **Does not control for heterogeneity 

Source: National Center for Education Statistics, 2017 NAEP Mathematics and Reading Assessments, https://www. 
nationsreportcard.gov/reading_math_2017_highlights/. 

The difference between the aggregated effects are substantial. 

rankings and the US. News rankings shows The difference between the disaggregated 
the effect of US. News’ use of only partial quality rank (first column) and the aggregated 
NAEP data—no fourth grade or science rank (third column) shows the effects of con- 
scores—and the inclusion of factors unre- trolling for heterogeneity—our focus in this 
lated to learning (e.g., graduation rates). The report—which are also substantial. States with 


small minority population shares (defined as 
Hispanic or black) tend to fall in the rankings 
when the data are disaggregated, and states 
with high shares of minority populations tend 
to rise when the data are disaggregated. 

There are substantial differences between 
our quality rankings and the US. News rank- 
ings. For example, Maine drops from 6th in 
the US. News ranking to 49th in the qual- 
ity ranking. Florida, which ranks 4oth in US. 
News’, jumps to 3rd in our quality ranking. 

Maine apparently does very well in the 
nonlearning components of U.S. News’ rank- 
ings; its aggregated NAEP scores would put 
it in 24th place, 18 positions lower than its 
US. News rank. But the aggregated NAEP 
scores overstate what its students have 
learned; Maine’s quality ranking is a full 25 
positions below that. On the 10 achieve- 
ment tests reported for Maine, its rankings 
on those tests are 46th, 45th, 48th, 37th, 41st, 
40th, 34th, goth, 41st, and 23rd. It is astound- 
ing that US. News could rank Maine as high 
as 6th, given the deficient performance of 
both its black and white students (the only 
two groups reported for Maine) relative to 
black and white students in other states. But 
since Maine’s student population is about 90 
percent white, the aggregated scores bias the 
results upward. 

On the other hand, Florida apparently 
scores poorly on US. News’ nonlearning at- 
tributes, since its aggregated NAEP scores 
(ranked 16th) are much better than its US. News 
score (ranked 4oth). Florida’s student popula- 
tion is about 60 percent nonwhite, meaning 
that the aggregate scores are likely to underesti- 
mate Florida’s education quality, which is borne 
out by the quality ranking. In fact, Florida gets 
considerably above-average scores for all but 
one of its 24 reported tests, with student per- 
formance on half of its tests among the top five 
states, which is how it is able to earn a rank of 
3rd in our quality rankings.”° 

The decline in Maine’s ranking is repre- 
sentative of some other New England and 
midwestern states such as Vermont, New 
Hampshire, and Minnesota, which tend to 


have largely white populations, leading to mis- 
leadingly high positions in typical rankings 
such as US. News’. The increase in Florida’s 
ranking mirrors gains in the rankings of other 
southern and southwestern states, such as 
Texas and Georgia, with large minority popu- 
lations. This leads to a serious distortion of 
beliefs about which parts of the country do a 
better job educating their students. 

We should note that the District of Co- 
lumbia, which is not ranked at all by US. 
News, does very well in our quality rankings. 
It is not surprising that D.C.’s disaggregated 
ranking is quite different from the aggregat- 
ed ranking, given that D.C.’s population is 
about 85 percent minority. Nevertheless, we 
suspect that the very large change in rank is 
something of an aberration. D.C.’s high rank- 
ing is driven by the unusually outstanding 
scores of its white students, who come from 
disproportionately affluent and educated 
families,** and whose scores were more than 
four standard deviations above the national 
white mean in each test subject they par- 
ticipated in (a greater difference than for any 
other single ethnic group in any state). Were 
it not for these scores, D.C. would be some- 
what below average (with D.C. blacks slightly 
below the national black average and Hispan- 
ics considerably below their average). 

Massachusetts and New Jersey, which are 
highly ranked by US. News, are also highly 
ranked by our methodology, indicating that 
they deserve their high rankings based on 
the performance of all their student groups. 
Other states have similar placements in both 
rankings. Overall, however, the correlation 
between our rankings and US. News’ rankings 
is only 0.35, which, while positive, does not 
evince a terribly strong relationship. 

Failing to disaggregate student-performance 
data and inserting factors not related to learn- 
ing distorts results. By construction, our mea- 
sure better reflects the relative performance 
of each group of students in each state, as 
measured by the NAEP data. We believe 
the differences between our rankings and 
the conventional rankings warrant a serious 
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reevaluation of which state education systems 
are doing the best jobs for their students; we 
hope the conventional ranking organizations 
will be prompted to make changes that more 
closely follow our methodology. 


EXAMINING THE EFFICIENCY OF 


EDUCATION EXPENDITURES 


The overall quality of a school system is ob- 
viously of interest to educators, parents, and 
politicians. However, it’s also important to 
consider, on behalf of taxpayers, the amount 
of government expenditure undertaken to 
achieve a given level of success. For example, 
New York spends the most money per student 
($22,232), almost twice as much as the typical 
state. Yet that massive expenditure results in 


a rank of only 31 in Table 1. Tennessee, on the 


other hand, achieves a similar level of success 
(ranked 30th) and spends only $8,739 per stu- 
dent. Although the two states appear to have 
education systems of similar quality, the citi- 
zens of Tennessee are getting far more bang 
for the buck. 

To show the spending efficiency of a state’s 
school system, Figure 1 plots per student ex- 
penditures on the horizontal axis against stu- 
dent performance on the vertical axis. Notice 
that New York and Tennessee are at about the 
same height but that New York is much far- 
ther to the right. 

The most efficient educational systems 
are seen in the upper-left corner of Figure 1, 
where systems are high quality and inexpen- 
sive. The least efficient systems are found in 
the lower right. From casual examination of 
Figure 1, it appears likely that some states are 


Per pupil expenditure (dollars) 


Figure 1 
Scatterplot of per pupil expenditures and average normalized NAEP test scores 
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Source: National Center for Education Statistics, 2017 NAEP Mathematics and Reading Assessments, https://www.nationsreportcard.gov/reading_ 


math_2017_highlights/. 


not using education funds efficiently. 

Because spending values are nominal—that 
is, not adjusted for cost-of-living differences 
across states—using unadjusted spending fig- 
ures might disadvantage high-cost states, in 
which above-average education costs may 
reflect price differences rather than more ex- 
travagent spending. For this reason, we also cal- 
culate a ranking based on education quality per 
adjusted dollar of expenditure, where the ad- 
justment controls for statewide differences in 
the cost of living (COL).” The COL-adjusted 
rankings are probably the rankings that best 
reflect how efficiently states are providing edu- 
cation. Adjusting for COL has a large effect on 
high-cost states such as Hawaii, California, and 
D.C. Table 2 presents two spending-efficiency 
rankings of states that capture how well their 
heterogeneous students do on NAEP exams 
in comparison to how much the state spends 
to achieve those rankings. These rankings are 
calculated by taking a slightly revised version of 
the state’s z-score and dividing it by the nomi- 
nal dollar amount of educational expenditure 
or by the COL-adjusted educational expendi- 
ture made by the state.*? These adjustments 
lower the rank of states like New York, which 
spends a great deal for mediocre performance, 
and increase the rank of states like Tennessee, 
which achieves similar performance at a much 
lower cost. Massachusetts and New Jersey, 
which impart a good deal of knowledge to their 
students, do so in such a costly manner using 
nominal values that they fall out of the top 20, 
although Massachusetts, having a higher cost 
of living, remains in the top 20 when the cost 
of living adjustment is made. States like Idaho 
and Utah, which achieve only mediocre success 
in imparting knowledge to students, do it so in- 
expensively that they move up near the top Io. 

The top of the efficiency ranking is domi- 
nated by states in the South and Southwest. 
This result is quite a difference from the tradi- 
tional rankings. 

The correlation between these spending 
efficiency rankings and the US. News rankings 
drops to —0.14 and —0.06 for the nominal and 
COL-adjusted efficiency rankings, respectively. 


This drop is not surprising since the rankings in 
Table 2 treat expenditures as something to be 
economized on, whereas the US. News rank 
ings don’t consider K-12 expenditures at all 
(and other rankings consider higher expendi- 
tures purely as a plus factor). The correlations 
of the Table 1 quality rankings and Table 2 effi- 
ciency rankings, with nominal and adjusted ex- 
penditures, are 0.54 and 0.65, respectively. This 
indicates that accounting for the efficiency of 
expenditures substantially alters the rankings, 
although somewhat less so when the cost of liv- 
ing is adjusted for. This higher correlation for 
the COL rankings makes sense because high- 
cost states devoting the same share of resources 
as the typical state would be expected to spend 
above-average nominal dollars, and the COL 
adjustment reflects that. 


Other Factors Possibly Related 
to Student Performance 


Our data allow us to make a brief analysis of 
some factors that might be related to student 
performance in states. Our candidate factors 
are expenditure per student (either nominal 
or COL adjusted), student-teacher ratios, the 
strength of teacher unions, the share of stu- 
dents in private schools, and the share in char- 
ter schools.“ The expenditure per student 
variable is considered in a quadratic form since 
diminishing marginal returns is a common ex- 
pectation in economic theory. 

Table 3 presents the summary statistics for 
these variables. The average z-score is close to 
zero, which is to be expected.” Nominal ex- 
penditure per student ranges from $6,837 to 
$22,232, with the COL-adjusted values having 
a somewhat smaller range. The union strength 
variable is merely a ranking from 1 to 51, with 51 
being the state with the most powerful union 
effect. The number of students per teacher 
ranges from a low of 10.54 to a high of 23.63. 
The other variables are self-explanatory. 

We use multiple regression analysis to 
measure the relationship between these vari- 
ables and our (dependent) variable—the av- 
erage z-scores drawn from state NAEP test 
scores in the 24 categories mentioned above. 
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Table 2 


State rankings adjusted for student heterogeneity and expenditures 


COL* efficiency 


*COL = cost of living. 


*Using nominal dollars. 


State 

Florida 

Texas 

Virginia 
Arizona 
Georgia 
North Carolina 
Indiana 

South Dakota 
Colorado 
Massachusetts 
Hawaii 

Utah 
Maryland 
California 


Idaho 


Montana 


District of Columbia 


Washington 
Kentucky 
Tennessee 
South Carolina 
New Jersey 
North Dakota 
Nevada 
Mississippi 


Oklahoma 


Efficiency rank** 


COL* efficiency 


State 

New Hampshire 
Ohio 
Nebraska 
Oregon 
Kansas 
Missouri 
Delaware 
New Mexico 
Minnesota 
lowa 
Wyoming 
Connecticut 
Pennsylvania 
Illinois 
Michigan 
Rhode Island 
Vermont 
Wisconsin 
Arkansas 
New York 
Louisiana 
Alaska 
Maine 
Alabama 


West Virginia 


Efficiency rank** 


Sources: National Center for Education Statistics, 2017 NAEP Mathematics and Reading Assessments, https://www.nationsreportcard.gov/ 
reading_math_2017_highlights/; and Missouri Economic Research and Information Center, Cost of Living Data Series 2017 Annual Average, 
https://www.missourieconomy.org/indicators/cost_of_living. 


Regression analysis can show how variables 
are related to one another but cannot dem- 
onstrate whether there is causality between a 
pair of variables where changes in one variable 
lead to changes in another variable. 

Table 4 provides the regression results us- 
ing COL expenditures (results on the left) or 
using nominal expenditures (results on the 
right). To save space, we only include the co- 
efficients and p-values, the latter of which, 
when subtracted from one, provides statisti- 
cal confidence levels. Those coefficients for 
variables that were statistically significant are 
marked with asterisks (one asterisk indicates 
ago percent confidence level and two a level 
of 95 percent). 

The choice of nominal vs. COL expendi- 
tures leads to a large difference in the results. 
The COL-adjusted results are likely to lead to 
a greater number of correct conclusions. 

Nominal expenditures per student are 
related in a positive and statistically signifi- 
cant manner to student performance up to a 
point, but the positive effect of expenditures 


Table 3 


Summary statistics 


Variables 

Z-score 

Expenditure per student (nominal, COL) 
Union strength 

Students per teacher 

Private school share of students 


Charter share of students 


Voucher dummy 


No. of observations 


per student declines as expenditures per stu- 
dent increase. The coefficients on the two 
expenditure-per-student variables indicate 
that additional nominal spending is no longer 
related to performance when nominal spend- 
ing gets to a level of $18,500 per student, a 
level that is exceeded by only a handful of 
states.”° The predicted decline in student 
performance for the few states exceeding 
the $18,500 limit, assuming causality from 
spending to performance, is quite small (ap- 
proximately two rank positions for the state 
with the largest expenditure),”” so that this 
evidence is best interpreted as supporting a 
view that the states with the highest spend- 
ing have reached a saturation point beyond 
which no more gains can be made.”® 

Using COL-adjusted values, however, stark- 
ly changes results. With COL values, no signif- 
icant relationship is found between spending 
and student performance, either in magnitude 
or statistical significance. This does not neces- 
sarily imply that spending overall has no effect 
on outcomes (assuming causality), but merely 


Minimum 
-0.0488 -1.5177 
12,256 / 11,548 6,837 /7,117 


26 


II 


COV ith COL 


values, no 
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relationship 
is found 
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performance, 
either in 
magnitude 
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Maximum 
1.2213 


22,232 /17,631 


Sources: National Center for Education Statistics, 2017 NAEP Mathematics and Reading Assessments, https://www.nationsreportcard.gov/ 
reading_math_2017_highlights/; Missouri Economic Research and Information Center, Cost of Living Data Series 2017 Annual Average, 
https://www.missourieconomy.org/indicators/cost_of_living; National Center for Education Statistics, Digest of Education Statistics: 2017, 
Table 236.65, https://nces.ed.gov/programs/digest/d17/tables/dt17_236.65.asp?current=yes; Amber M. Winkler, Janie Scull, and Dara 
Zeehandelaar, “How Strong are Teacher Unions? A State-By-State Comparison,’ Thomas B. Fordham Institute and Education Reform 
Now, 2012; Digest of Education Statistics: 2017, Table 208.40, https://nces.ed.gov/programs/digest/d17/tables/dt17_208.40.asp?current=yes; 
National Center for Education Statistics, Private School Universe Survey, https://nces.ed.gov/surveys/pss/; charter school share determined 
by dividing the total enrollment in charter schools by the total enrollment in all public schools for each state, Digest of Education Statistics: 
2017, Table 216.90, https://nces.ed.gov/programs/digest/d17/tables/dt17_216.90.asp?current=yes, and Digest of Education Statistics: 2016, Table 
203.20, https://nces.ed.gov/programs/digest/d16/tables/dt16_203.20.asp.; and Education Commission of the States, “SO-State Comparison: 
Vouchers,” March 6, 2017, http://www.ecs.org/50-state-comparison-vouchers/. 
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Table 4 


Multiple regression results explaining quality of education 


Cost of living adjusted Nominal dollars 
Coefficient p-value Coefficient p-value 
Expenditure per student 3.89E-05 0.871 3.75E-04 0.062 


Expenditure per student 
squared 


-2.75E-10 0.977 -1.04E-08 0.089 
Union strength -0.01125 0.091 -0.024 0.026 
Students per teacher -0.04499 0.219 0.013 0.755 
Private school share of students -0.68112 0.823 -1.193 0.691 
Charter share of students 1.96458 0.033 1.098 0.342 


Vouchers allowed -0.18267 0.435 -0.14306 0.538 


Constant 0.53484 0.765 -2.44191 0.11 


R-squared/observations 0.15 51 0.217 51 


Sources: National Center for Education Statistics, 2017 NAEP Mathematics and Reading Assessments, https://www.nationsreportcard. 
gov/reading_math_2017_highlights/; Missouri Economic Research and Information Center, Cost of Living Data Series 2017 Annual 
Average, https://www.missourieconomy.org/indicators/cost_of_living; National Center for Education Statistics, Digest of Education 
Statistics: 2017, Table 236.65, https://nces.ed.gov/programs/digest/d17/tables/dt17_236.65.asp?current=yes; Amber M. Winkler, Janie 
Scull, and Dara Zeehandelaar, “How Strong are Teacher Unions? A State-By-State Comparison,” Thomas B. Fordham Institute 
and Education Reform Now, 2012; Digest of Education Statistics: 2017, Table 208.40, https://nces.ed.gov/programs/digest/d17/ 
tables/dt17_208.40.asp?current=yes; National Center for Education Statistics, Private School Universe Survey, https://nces.ed.gov/ 
surveys/pss/; charter school share determined by dividing the total enrollment in charter schools by the total enrollment in all public 
schools for each state, Digest of Education Statistics: 2017, Table 216.90, https://nces.ed.gov/programs/digest/d17/tables/dt17_216.90. 
asp?current=yes, and Digest of Education Statistics: 2016, Table 203.20, https://nces.ed.gov/programs/digest/d16/tables/dt16_203.20. 
asp; and Education Commission of the States, “SO-State Comparison: Vouchers,” March 6, 2017, http://www.ecs.org/SO-state- 
comparison-vouchers/. 


that most states have reached a sufficient level 
of spending such that additional spending 
does not appear to be related to achievement 
as measured by these test scores. This is a dif- 
ferent conclusion from that based on nominal 
expenditures. These different results imply 
that care must be taken, not just to ensure that 
achievement test scores are disaggregated in 
analyses of educational performance, but also 
that if expenditures are used in such analyses, 
they are adjusted for cost of living differentials. 

The union strength variable in Table 4 has 
a substantial and statistically significant nega- 
tive relationship with student achievement. 
The coefficient in the nominal expenditure re- 
gressions suggests a relationship such that if a 
state went from having the weakest unions to 


the strongest unions, holding the other educa- 
tion factors constant, that state would have an 
increase in its z-score of over 1.22 (0.024 x 51). 
To put this in perspective, note in Table 3 that 
the z-scores vary from a high of 1.22 to a low 
of -1.51, a range of 2.73. Thus, the shift from 
weakest to strongest unions would move a 
state about 45 percent of the way through this 
total range, or equivalently, alter the rank of 
the state by about 23 positions.”? This is a dra- 
matic result. The COL regressions also show a 
large relationship, but it is only about half the 
magnitude of the coefficient in the nominal 
expenditure regressions. This negative rela- 
tionship suggests an obvious interpretation. 
It is well known that teachers’ unions aim to 
increase wages for their members, which may 


increase student performance if higher qual- 
ity teachers are drawn to the higher salaries. 
Such a hypothesis is inconsistent with the 
finding here, which is instead consistent with 
the view that unions are negatively related to 
student performance, presumably by oppos- 
ing the removal of underperforming teachers, 
opposing merit-based pay, or because of union 
work rules. While much of the empirical lit- 
erature finds positive relationships between 
unionization and student performance, stud- 
ies that most effectively control for heteroge- 
neous student populations, as we have, tend 
to find more negative relationships, such as 
those found here.° 

Our results also indicate that having a 
greater share of students in charter schools 
is positively related to student achievement, 
with the result being statistically significant 
in the COL regressions but not in the nominal 
expenditure regressions. The size of the rela- 
tionship is fairly small, however, indicating, 
if the relationship were causal, that when a 
state increases its share of students in charter 
schools from o to 50 percent (slightly above 
the level of the highest observation) it would 
be expected to have an increase in rank of only 
0.9 positions (0.5 x 1.8) in the COL regression 
and about half of that in the nominal expen- 
diture regressions (where the coefficient is not 
statistically significant).* Given that there is 
great heterogeneity in charter schools both 
within and between states, it is not surpris- 
ing that our rather simple statistical approach 
does not find much of a relationship. 

We also find that the share of students in 
private schools has a small negative relation- 
ship with the performance of students in 
public schools, but the level of statistical con- 
fidence is far too low for these results to be 
given any credence. (Although private school 
students take the NAEP exam, the NAEP 
data we use are based only on public school 
students.) Similarly, the existence of vouch- 
ers appears to have a negative relationship to 
achievement, but the high p-values tell us we 
cannot have confidence in those results. 

There is some slight evidence, based on 


the COL regression, that higher student- 
teacher ratios have a small negative relation- 
ship with student performance, but the level 
of statistical confidence is below normally ac- 
cepted levels. Though having more students 
per teacher is theorized to be negatively re- 
lated to student performance, the empirical 
literature largely fails to find consistent ef- 
fects of student-teacher ratios and class size 
on student performance.” We should not be 
too surprised that student-teacher ratios do 
not appear to have a clear relationship with 
learning since the student-teacher ratios used 
here are aggregated for entire states, merging 
together many different classrooms in ele- 
mentary, middle, and high schools. 


SOME LIMITATIONS 


Although this study constitutes a signifi- 
cant improvement on leading state education 
rankings, it retains some of their limitations. 

If the makers of state education rankings 
were to be frank, they would acknowledge 
that the entire enterprise of ranking state-level 
systems is only a blunt instrument for judg- 
ing school quality. There exists substantial 
variation in educational quality within states. 
Schools differ from district to district and 
within districts. We generally dislike the idea of 
painting the performance of all schools ina giv- 
en state with the same brush. However, state- 
level rankings do provide an intuitively pleasing 
basis for lawmakers and interested citizens to 
compare state education policies. Because state 
rankings currently play such a prominent role 
in the public debate on education policy, their 
more glaring methodological defects detailed 
above demand rectification. Any state ranking 
is nonetheless limited by aggregation inherent 
at the state-level unit of analysis. 

Another limitation to our study, common 
to virtually all state education rankings, is 
that we treat the result of education as a one- 
dimensional variable. Of course, educational 
results are multifaceted and more complex than 
a single measure could capture. A standardized 
test may not pick up potentially important 


13 


66 


Unions are 
negatively 
related to 
student 
perfor- 
mance. 

99 


14 


These results 
run counter to 
conventional 
wisdom that 
the best 
education 

is found in 
northern and 
eastern states 
with powerful 
unions and 
high expen- 
ditures. 99 


qualities such as creativity, critical thinking, or 
grit. Part of the problem is that there is no ac- 
cepted measurement of those attributes. 

We also are using a data snapshot that re- 
flects measures of learning at a particular mo- 
ment in time. However, the performance of 
students at any grade level depends on their 
education at all prior grade levels. A ranking of 
states based on student performance is the cul- 
mination of learning over a lengthy time period. 
An implicit assumption in creating such rank- 
ings is that the quality of various school systems 
changes slowly enough for a snapshot in one 
year to convey meaningful information about 
the school system as it exists over the entire in- 
terval in which learning occurred. This assump- 
tion allows us to attribute current or recent 
student performance, which is largely based on 
past years of teaching, to the teaching quality 
currently found in these schools. This assump- 
tion is present in most state rankings but may 
obscure sudden and significant improvement, 
or deterioration, in student knowledge that oc- 
curs in discrete years. 


CONCLUSIONS 

While the state level may be too aggregated 
a unit of analysis for the optimal examination 
of educational outcomes, state rankings are 
frequently used and discussed. Whether based 
appropriately on learning outcomes or inap- 
propriately on nonlearning factors, compari- 
sons between states greatly influence the public 
discourse on education. When these rankings 
fail to account for the heterogeneity of student 
populations, however, they skew results in favor 
of states with fewer socioeconomically chal- 
lenged students. 

Our ranking corrects these problems by fo- 
cusing on outputs and the value added to each 
of the demographic groups the state education 
system serves. Furthermore, we consider the 
cost-effectiveness of education spending in US. 
states. States that spend efficiently should be 
recognized as more successful than states pay- 
ing larger sums for similar or worse outcomes. 


Adjusting for the heterogeneity of 


students has a powerful effect on the assess- 
ments of how well states educate their stu- 
dents. Certain southern and western states, 
such as Florida and Texas, have much better 
student performances than appears to be the 
case when student heterogeneity is not taken 
into account. Other states, such as Maine 
and Rhode Island in New England, fall sub- 
stantially. These results run counter to con- 
ventional wisdom that the best education 
is found in northern and eastern states with 
powerful unions and high expenditures. 

This difference is even more pronounced 
when spending efficiency, a factor generally 
neglected in conventional rankings, is taken 
into account. Florida, Texas, and Virginia are 
seen to be the most efficient in terms of quality 
achieved per COL-adjusted dollar spent. Con- 
versely, West Virginia, Alabama, and Maine are 
the least efficient. Some states that do an excel- 
lent job educating students, such as Massachu- 
setts and New Jersey, also spend quite lavishly 
and thus fall considerably when spending efh- 
ciency is considered. 

Finally, we examine some factors thought 
to influence student performance. We find 
evidence that state spending appears to have 
reached a point of zero returns and that 
unionization is negatively related to student 
performance, and some evidence that charter 
schools may have a small positive relationship 
to student achievement. We find little evi- 
dence that class size, vouchers, or the share of 
students in private schools have measurable 
effects on state performance. 

Which state education systems are worth 
emulating and which are not? The conventional 
answer to this question deserves to be reevalu- 
ated in light of the results presented in this 
report. We hope that our rankings will better 
inform pundits, policymakers, and activists as 
they seek to improve K-12 education. 


APPENDIX 

Conventional education-ranking meth- 
odologies based on NAEP achievement tests 
are likely to skew results. In this Appendix, 


we provide a simple example of how and why 
that happens. 

Our example assumes two types of students 
and three types of schools (or state school sys- 
tems). The two columns on the right in appen- 
dix Table 1 denote different types of student, 
and each row represents a different school. 
School B is assumed to be 10 percent better 
than School A, and School C is assumed to be 
20 percent better than School A, regardless of 
the student type being educated. 

There are two types of students; S2 stu- 
dents are better prepared than S1 students. 
Students of the same type score differently on 
standard exams depending on which school 
they are in, but the two student types also per- 
form differently from each other no matter 
which school they attend. Depending on the 
proportions of each type of student in a given 
school, a school’s rank may vary substantially 
if the wrong methodology is used. 

An informative ranking should reflect each 
school’s relative performance, and the scores 
on which the rankings are based should reflect 
the 10 percent difference between School A 
and School B, and the 20 percent difference 
between School A and School C. Obviously, 
a reliable ranking mechanism should place 
School A in 3rd place, B in 2nd, and C in ist. 

However, problems arise for the typical 
ranking procedure when schools have dif- 
ferent proportions of student types. The ap- 
pendix Table 2 shows results from a typical 
ranking procedure under two different popu- 
lation scenarios. 

School ranking 1 shows what happens when 
75 percent of School A’s students are type S2 
and 25 percent are type S1; School B’s students 


Table 1 


Example of students and scores 


School quality Student 1 (S1) score 


School A 
School B 


School C 


Source: Author calculations. 


are split 50-50 between types Si and S2; and 
School C’s students are 75 percent type Si and 
25 percent type S2.% 

Because School A has a disproportion- 
ately large share of the stronger Sz students, it 
scores above the other two schools even though 
School A is the weakest school. Ranking 1 com- 
pletely inverts the correct ranking of schools. 
This example, detailed in appendix Table 2, 
demonstrates how rankings that do not take 
the heterogeneity of students and the propor- 
tions of each type of student in each school into 
account can give entirely misleading results. 

Conversely, school ranking 2 reverses 
the student populations of schools A and C. 
School C now also has more of the strongest 
students. The rankings are correctly ordered, 
but the underlying data used for the rankings 
greatly exaggerate the superiority of School 
C. Comparing the scores of the three schools, 
School B appears to be 32 percent better than 
School A and School C appears to be 68 per- 
cent better than School A, even though we 
know (by construction) that the correct values 
are IO percent and 20 percent, respectively. 
School ranking 2 only happens to get the or- 
der right because there are no intermediary 
schools whose rankings would be improperly 
altered by the exaggerated scores of schools A 
and C in ranking 2. 

The ranking methodology used in this pa- 
per, by contrast, compares each school for 
each type of student separately. It measures 
quality by looking at the numbers in appendix 
Table 1 and noting that each type of student 
at School B scores 10 percent higher than the 
same type of student at School A, and each 
type of student at School C scores 20 percent 


Student 2 (S2) score 
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Table 2 


Rankings not accounting for heterogeneity 


School ranking 1 
School A [1/4 S1, 3/4 S2] 
School B [1/2 S1, 1/2 S2] 
School C [3/4 S1, 1/4 $2] 
School ranking 2 


School A [3/4 S1, 1/4 S2] 


School B [1/2 S1, 1/2 S2] 


School C [1/4 S1, 3/4 S2] 


Source: Author calculations. 


higher than the same type of student at School 
A. That is what makes our methodology con- 
ceptually superior to prior methodologies. 

If all schools happened to have the same 
share of different types of students, a possibil- 
ity not shown in appendix Table 2, the conven- 
tional ranking methodology used by US. News 
would work as well as our rankings. But our 


analysis in this paper has shown that schools 
and school systems in the real world have very 
different student populations, which is why 
our rankings differ so much from previous 
rankings. Our general methodology isn’t just 
hypothetically better under certain demo- 
graphic assumptions; rather, it is better under 
any and all demographic circumstances. 


NOTES 

1. “Pre-K—12 Education Rankings: Measuring How Well States Are 
Preparing Students for College,” US. News & World Report, May 
18, 2018, https:/Avwwusnews.com/news/best-states/rankings/ 
education/preK-12. Others include those by Wallet Hub, Educa- 
tion Week, and the American Legislative Exchange Council. 


2. Govs. Phil Murphy of New Jersey and Greg Abbott of Texas 
recently sparred over the virtues and vices of their state busi- 
ness climates, including their education systems, in a pair of 
newspaper articles. Greg Abbott, “Hey, Jersey, Don’t Move to 
Fla. to Avoid High Taxes, Come to Texas. Love, Gov. Abbott,” 
Star-Ledger, April 17 2018, http://www.nj.com/opinion/index. 
ssf/2018/0 4/hey_jersey_dont_move_to_fla_to_avoid_high. 

taxes_co.html; and Phil Murphy, “NJ Gov. Murphy to Texas 
Gov. Abbott: Back Off from Our People and Companies,” Da/- 
las Morning News, April 18, 2018, https://www.dallasnews.com/ 


opinion/commentary/2018/04/18/nj-gov-murphy-texas-gov- 
abbott-back-people-companies. 


3. Bryce Covert, “Oklahoma Teachers Strike for a 4th Day to 
Protest Rock-Bottom Education Funding,” Nation, April 5, 2018. 


4. We are aware of an earlier discussion by Dave Burge in a March 
2, 2011, posting on his “Iowahawk” blog, discussing the mismatch 
between state K-12 rankings with and without accounting for 
heterogeneous student populations, http://iowahawk.typepad. 
com/iowahawk/2011/03/longhorns-17-badgers-1.html. A 2015 
report by Matthew M. Chingos, “Breaking the Curve,” https:// 
www.urban.org/research/publication/breaking-curve-promises- 
and-pitfalls-using-naep-data-assess-state-role-student-achieve- 
ment, published by the Urban Institute, is a more complete 
discussion of the problems of aggregation and presents ona sep- 
arate webpage updated rankings of states that are similar to ours, 
but it does not discuss the nature of the differences between its 
rankings and the more traditional rankings. Chingos uses more 
controls than just ethnicity, but the extra controls have only 
minor effects on the rankings. He also uses the more complete 
“restricted use” data set from the National Assessment of Edu- 
cation Progress (NAEP), whereas we use the less complete but 
more readily available public NAEP data. One advantage of our 
analysis, in a society obsessed with STEM proficiency, is that we 
use the science test in addition to math and reading, whereas 
Chingos only uses math and reading. 


5. For a recent example of the spending hypothesis see Paul 
Krugman, “We Don’t Need No Education,” New York Times, 
April 23, 2018. Krugman approvingly cites California and New 
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York as positive examples of states that have considerably raised 
teacher pay over the last two decades, implying that such states 
would do a better job educating students. As noted in this paper, 
both states rank below average in educating their students. 


6. We assume, as do other rankings that use NAEP data, that 
the NAEP tests assess student performance on material that 
students should be learning and therefore reflect the success of 
a school system in educating its students. It is of course possible 
that standardized tests do not correctly measure educational 
success. This would be a particular problem if some schools alter 
their teaching to focus on doing well on those tests while other 
schools do not. We think this is less ofa problem for NAEP tests 
because most grades and most teachers are not included in the 
sample, meaning that when teacher pay and school funding are 
tied to performance on standardized tests, they will be tied to 
tests other than NAEP. 


7. Since 1969, the NAEP test has been administered by the Na- 
tional Center for Education Statistics within the U.S. Depart- 
ment of Education. Results are released annually as “the nation’s 
report card.” Tests in several subjects are administered to 4th, 
8th, and sometimes 12th graders. Not every state is given every 
test in every year, but all states take the math and reading tests 
at least every two years. The National Assessment Govern- 
ing Board determines which test subjects will be administered 
each year. In the analysis below, we use the most recent data for 
math and reading tests, from 2017, and the science test is from 
2015. NAEP tests are not given to every student in every state, 
but rather, results are drawn from a sample. Tests are given to a 
sample of students within each jurisdiction, selected at random 
from schools chosen so as to reflect the overall demographic 
and socioeconomic characteristics of the jurisdiction. Roughly 
20-40 students are tested from each selected school. In a com- 
bined national and state sample, there are approximately 3,000 
students per participating jurisdiction from approximately 100 
schools. NAEP 8th grade test scores are a component of US. 
News’ state K-12 education rankings, but are highly aggregated. 


8. As direct measures of student learning for the entire student 
body, NAEP scores should form the basis of any state rankings 
of education. Nevertheless, rankings such as US. News’ include 
not only NAEP scores, but other variables that do not measure 
learning, such as graduation rates, pre-K education quality/ 
enrollment, and ACT/SAT scores, which measure learning but 
are not, in many cases, taken by all students in a state and are 
likely to be highly correlated with NAEP scores. We believe 
that these other measures do not belong in a ranking of state 
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education quality. 


9. “Quality Counts 2018: Grading the States,” Education Week, 
January 2018, https:/Awwwedweek.org/ew/collections/quality- 
counts-2018-state-grades/index.html. The three broad compo- 
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nents used in this ranking include “chance for success,” “state 


finances,” and “K-12 achievement.” 


10. Informed by such rankings, it’s no wonder the public de- 
bate on education typically assumes more spending is always 
better, even in the absence of corresponding improvements in 
student outcomes. 


11. NAEP data also include scores for the ethnic categories 
“American Indian/Native Alaskan,” and “Two or More.” How- 
ever, too few states had sufficient data for these scores to be a 
reliable indicator of the performance of these groups in that 
state. These populations are small enough in number to ex- 
clude from our analysis. 


12. Not all states give all the tests (e.g., science test) to their 
students. While every state must have students take the math 
and reading tests at least every two years, the National Assess- 
ment Governing Board determines which other tests will be 
given to which states. 


13. Because Iowa lacks enough Asian fourth and eighth grade 
students to provide a reliable average from the NAEP sample, 
NAEP does not have scores for Asian fourth graders in any sub- 
ject or Asian eighth graders in science. This lowers the number 
of possible tests in Iowa from 24 to 20. 


14. Our rankings assume that students in each ethnic group are 
similar across states. Although this assumption may not always 
be correct, it is more realistic than the assumption made in other 
rankings that the entire student population is similar across states. 


15. For example, Washington, Utah, North Dakota, New Hamp- 
shire, Nebraska, and Minnesota also shut out Texas on all six 
tests (math, reading, science, 4th and 8th grades) under the as- 
sumption of homogeneous student populations. Nevertheless, 
Texas dominates all these states when comparisons are made 
using the full set of 24 exams that allow for student heteroge- 
neity. Six states change by more than 24 positions depending 
on whether they are ranked using aggregated or disaggregated 
NAEP scores. 


16. There are other categories in the NAEP data not directly 


related to race. Several of these (e.g., disability status, English 
language learner status, gender) have only minuscule effects on 
rankings and thus are ignored in our analysis. Among these non- 
racial factors, the most important is whether the student quali- 
fies for subsidized school lunches, a proxy for family income. 
We do not include this variable in our analysis because the in- 
come requirements determining which students qualify for 
subsidized lunches are the same for all states in the contiguous 
United States, despite considerable differences in cost of living 
between jurisdictions. High cost of living states can have costs 
85 percent higher than low cost of living states. High cost of liv- 
ing states will have fewer students qualify for subsidized lunch- 
es, and low cost of living states will have more students qualify 
than would be the case if cost of living adjustments were made. 
Because the distribution of cost of living values across states is 
not symmetrical, the difference in scores between students with 
subsidized lunches and students without, across states, is likely 
to be biased. This bias is pertinent to our examination of state 
education systems and student performance, so we exclude it 
from our analysis. Its inclusion would only have had a minor ef- 
fect on our rankings, however, since the correlation between a 
state ranking that includes this variable (at half the importance 
of the four equally weighted ethnicity variables) with one that 
excludes it is 0.92. A different nonracial variable is the parents’ 
education level, but this variable has the deficiency of only being 
available for eighth grade and not fourth grade students. 


17. While we would have preferred to include test scores for 12th 
grade students, the data were not sufficiently complete to do so. 
While the NAEP test is given to 12th graders, it was only given 
to a national sample of students in 2015, and the most recent 
state averages available are from 2013. Even these 2013 state av- 
erages did not have a sufficient number of students from many 
of the ethnic groups we consider, and many states lacked a large 
number of observations. Because of the relatively incomplete 
data for 12th graders, we chose to include only 4th and 8th grade 
test scores. Note that US. News only includes state averages for 
8th grade math and reading tests in their rankings. 


18. When states do not report scores for each of the 24 NAEP 
categories, those states have their average scores calculated 
based on the exams that are reported. 


19. We equate the importance of each of the 24 tests by forming, 
for each of the 24 possible exams, a z-score for each state, under 
the assumption that these state test scores have a normal dis- 
tribution. The z-statistic for each observation is the difference 
between a particular state’s test score and the average score for 


all states, divided by the standard deviation of those scores over 
the states. Our overall ranking is merely the average z-score for 
each state. Thus, exams with greater variations or higher or low- 
er mean scores do not have greater weight than any other test 
in our sample. The z-score measures how many standard devia- 
tions a state is above or below the mean score calculated over all 
states. One might argue that we should use an average weighted 
by the share of students, but we choose to give each group equal 
importance. If we had used population weights, the rankings 
would not have changed very much because the correlation be- 
tween the two sets of scores is 0.86, and four of the top-five and 
four of the bottom-five states remain the same. 


20. Without listing all of Florida’s 24 scores, its lowest 5 (out 
of the 51 states, in reverse order) are ranked 27, 21, 20, 19, and 
10. The rest are all ranked in the top 10, with 12 of Florida’s test 
scores among the top 5 states. 


21. Some 89 percent are college educated. See for example, 
David Alpert, “DC Has Almost No White Residents without 
College Degrees,” GGW.org, August 29, 2016, https://ggwash. 
org/view/42563/dc-has-almost-no-white-residents-without- 
college-degrees-its-a-different-story-for-black-residents. 


22. The statewide cost of living adjustments are taken from 
the Missouri Economic Research and Information Center’s 
Cost of Living Data Series 2017 Annual Average, https://wvww. 
missourieconomy.org/indicators/cost_of_living. 


23. It would be a mistake to use straightforward z-scores from 
Table 1 when constructing the “z-Score/$” variable because 
states with z-scores near zero and thus near one another would 
hardly differ even if their expenditures per student were very dif- 
ferent. Instead, we added 2.50 to each z-score so that all states 
have positive z-scores and the lowest state would have a revised 
z-score of 1. We then divided each state’s revised z-score by the 
expenditure per student to arrive at the values shown in Table 2. 


24. Data on expenditures, student-teacher ratios, and share of 
students in charter schools are taken from the National Center 
for Education Statistics’ Digest of Education Statistics. Data on 
share of students in private schools come from the NCES’s Pri- 
vate School Universe Survey. Our variable for unionization is a 
2012 ranking of states constructed by researchers at the Thomas 
B. Fordham Institute, an education research organization, that 
used 37 different variables in five broad categories (Resources 
and Membership, Involvement in Politics, Scope in Bargain- 
ing, State Policies, and Perceived Influence). The ranking can 
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be found in Amber M. Winkler, Janie Scull, and Dara Zeehande- 
laar, “How Strong Are Teacher Unions? A State-By-State Com- 
parison,” Thomas B. Fordham Institute and Education Reform 
Now, 2012, https://edexcellence.net/publications/how-strong- 
are-us-teacher-unions.html. 


25. Because many states had results for fewer than 24 exams, 
giving each state equal weight in the overall average would not 
provide the zero “average” that would be expected if the average 
of every z-score were used to form the average. There were also 


some rounding errors. 


26. New Jersey, Vermont, Connecticut, Washington, D.C., 
Alaska, and New York all exceed this level. 


27. The decline is 0.15 z-units, which is about 5 percent of the 
total z-score range. 


28. We should also note that efficient use of money requires that 
it be spent up until the point where the marginal value of the 
benefits is less than the marginal expenditure. Thus, the point 
where increasing expenditures provides no additional value can- 
not be the efficient level of expenditure. Instead, the efficient 
level of expenditure must lie below that amount. 


29. This is a somewhat rough approximation because the ranks 
form a uniform distribution and the z-scores form a normal distri- 
bution with the mass of observations near the mean. Amovement 
of a given z-distance will change ranks more if the movement oc- 
curs near the mean than if the movement occurs near the tails. 


30. For a review of the literature on unionization and student 
performance, see Joshua M. Cowen and Katharine O. Strunk, 
“The Impact of Teachers’ Unions on Educational Outcomes: 
What We Know and What We Need to Learn,” Economics of 
Education Review 48 (2015): 208-23. Earlier studies found posi- 
tive effects of unionization, but recent studies are more mixed. 
Most researchers agree that unionization likely affects different 
types of students differently. For studies that find unionization 
negatively affects student performance, see Caroline Minter 
Hoxby, “How Teachers’ Unions Affect Education Production,” 
Quarterly Fournal of Economics 111, no. 3 (1996): 671-718, https:// 
doi.org/10.2307/2946669; and Geeta Kingdom and Francis 
Teal, “Teacher Unions, Teacher Pay and Student Performance 
in India: A Pupil Fixed Effects Approach,” Journal of Develop- 
ment Economics 91, no. 2 (2010): 278-88, https://doi.org/10.1016/j. 
jdeveco.2009.09.001. For studies that find no effect of unioniza- 
tion, see Michael F. Lovenheim, “The Effect of Teachers’ Unions 


20 


on Education Production: Evidence from Union Election Cer- 
tifications in Three Midwestern States,” Journal of Labor Eco- 
nomics 27, nO. 4 (2009): 525-87, https://doi.org/10.1086/605653. 
More recently, only very small negative effects on student per- 
formance were found in Bradley D. Mariano and Katharine O. 
Strunk, “The Bad End of the Bargain? Revisiting the Relation- 
ship between Collective Bargaining Agreements and Student 
Achievement,” Economics of Education Review 65 (2018): 93-106, 
https://doi.org/10.1016/j.econedurev.2018.04.006. 


31. To arrive at this value, we multiply the coefficient (1.96) by 
50 percent to determine the change in z-score and then divide 
by 2.73, the range of z-scores among the states. This provides 
a value of 35.5 percent, indicating how much of the range in z- 
scores would be traversed as a result of the change in charter 
school students. This value is then multiplied by 51 states in 
the analysis. 


32. For a discussion on the empirical literature regarding school 
class size, see Edward P. Lazear, “Educational Production,” 


Quarterly Fournal of Economics 116, no. 3 (2001): 777-803, https:// 
doi.org/10.1162/00335530152466232. Lazear suggests that differ- 
ing student and teacher characteristics make it difficult to isolate 
the effect of class size on student outcomes. This view, although 
spun in a more positive light, generally is supported in a more 
recent summary of the academic literature found in Grover J. 
Whitehurst and Matthew M. Chingos, “Class Size: What Re- 
search Says and Why It Matters for State Policy,” Brown Center 
on Education Policy, Brookings Institution, May 2011, https:// 
www.brookings.edu/research/class-size-what-research-says- 
and-what-it-means-for-state-policy/. 


33. The score column in appendix Table 2 merely multiplies the 
score for each type of student at a school by the share of the 
student type in the school population and sums the amounts. 
For example, the 87.5 value for School A in ranking 1 is found by 
multiplying the Sr score of 50 by .25 (=12.5) and adding that to the 
product of the population share of S2 (0.75) and the S2 score of 
100 (75) in School A. This method is effectively what U.S. News 
and other conventional rankings use. 
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